!pr0
Profiler............................................Bill Morgan

For the last several months, I've been intrigued by an article in the August '83 issue of Byte Magazine, "Chisel Your Code with a Profiler", by Dennis Leas and Paul Wintz.  They describe a utility program, called a profiler, which measures where an executing program is spending most of its time.  The largest application for such a tool is testing programs compiled from a high-level language.  Typically such a program will spend nearly all of its time executing only a small section of the code.  Leas and Wintz claim that the proportion is about 90% of the time in about 10% of the code.  With a profiler you can identify the bottleneck and speed up the whole program by recoding one small piece.

The profiler first divides your program into sixteen "bins".  It then interrupts your program periodically and reads the stored Program Counter from the stack.  If the program is in the area you want to measure, it increments one of an array of counters.  The profiler then returns control to your program until the next interrupt occurs.  When the testing period is finished you can display the counters and spot your problem areas.

An essential part of this tool is a source of regularly timed interrupts.  The best place to get a timed interrupt signal is from a suitable clock card.  All of the clock cards have some provision for generating interrupts, usually at intervals of about 1 millisecond or 1 second.  Some also have available 64 Hz or 256 Hz frequencies, or other values.  Check the documentation with your clock to see exactly how to use its interrupt features.

The interrupt timing you want to use will in part be a function of how long your program, or subroutine, will run.  If you're profiling a sort that takes several minutes to complete, a 1000 Hz interrupt will overflow the counters long before a significant amount has been done.  If the routine takes a short time, a 60 Hz clock won't catch enough hits to be meaningful.  Leas and Wintz use a 6 Hz signal picked up from their disk drives to profile a compiler that runs for about 10 minutes.

If all you have available is a high-frequency signal, it's easy enough to divide it down to something usable.  Just initialize a counter in the setup portion of the program to the necessary value.  Then whenever an interrupt occurs, decrement the counter.  Most of the time the counter will be non-zero, so then branch directly to the exit portion of the handler.  When the counter reaches zero, go ahead and do the full interrupt processing and then reset the counter.

What if I don't have a clock?, you ask.  That is exactly the problem I had when I started thinking about this project.  Then I ran across an article in the July 83 issue of Micro in which Charles Putney (a subscriber and sometimes contributor to these pages) told how to get a 60 Hz signal to the interrupt line.  Charles' article tells how to use that signal to implement a real-time clock, but it seemed to me that here was exactly the interrupt signal I had been seeking for my profiler.

All you need to do is add a wire inside your Apple, from pin 11 of the 74LS161 at coordinate D11 to pin 4 of the 6502.  I used a pair of plunger clips (Radio Shack #270-370, the smallest ones) to attach the wires, and also put a pushbutton in the circuit.  When attaching the clips to the IC pins, be EXTREMELY careful not to short any adjacent pins, and try to arrange things so that the wire doesn't wobble around.  TURN THE POWER OFF BEFORE MESSING WITH WIRING INSIDE YOUR COMPUTER.

Here's a drawing that shows where to connect the wires:














Note that the photograph in the Micro article does NOT show the correct pins.  The description in Putney's text is correct, but whoever did the photo artwork garbled it.

The signal we are borrowing is one of the video timing signals, called V5.  V5 is normally high (~5 volts).  It goes low every 1/60th of a second, and stays low for about 380 microseconds.  That's a pretty good interrupt signal, but we're going to have to allow time for V5 to get back to its high state before we return to the main program, or we'll get more than one interrupt per cycle.


The program

When you BRUN or CALL Profiler, lines 1120-1130 hook the Initialize portion of the program into the monitor's CTRL-Y vector.

Initialize first connects the Handler routine to the IRQ vector (1190-1220).  It then gets the starting and ending addresses from where the Monitor left them, takes the difference and divides by 16 to get the size of each bin (1270-1390).

Build Table then starts the table with the Start address and loops to set each table entry Step bytes larger than the previous one (1420-1650).  At the same time the routine sets the count for each bin to zero and adds an extra zero byte after the count (1610-1630).  This extra byte makes the table easier to read with a Monitor memory dump.

Note that the last entry is set to the End value, rather than a calculated step (1670-1700).  This makes the last bin larger than the others, by somewhere between 0 and 15 bytes, to compensate for the remainder left behind when we divided to get Step.

Now we come to the Handler itself.  When an IRQ interrupt occurs we first save all the registers on the stack (1740-1790).  The next step is to extract the Program Counter value from inside the stack and save it (1800-1840).

The next step is checking that PC value to see if it is inside the range we want (1860-1920).  If not, go on to Exit.

If we are in range, search down the table to find the right bin (1930-1980) and register the count (2000-2040).  Since the counters only go up to 255, I have the profiler stop when one of them wraps around (2070-2080).

The Exit routine includes a delay loop (2120-2140) to make sure that the Handler takes at least 380 microseconds.  This insures that the V5 line we're using for an interrupt source has gone back high, and won't interrupt again as soon as the RTI is done.  If you're lucky enough to be getting your interrupts from a clock card you won't need this loop, but you will need to do something to tell the card that you're done with this interrupt.  Check your clock manual.  Profiler then ends by restoring the registers and doing an RTI.

The Compare Entry routine (2220-2270) just compares the PC value to the current table value.

The funny-looking FILLER space (line 2340) makes sure that the Table begins on a new line in the Monitor memory dump, keeping things easy to read.


Using Profiler

When I want to profile a program, I first assemble Profiler to run somewhere out of the way above or below the program I want to test.  Then I enter the Monitor and type addrG to connect Profiler to CTRL-Y.  Next I enter addr1.addr2^Y (that's <Start-address>.<End-address><CTRL-Y>) to initialize things.

The next step is to start the program I want to measure, and then start the interrupts coming.  My system has a pushbutton between the 60 Hz source and the IRQ line, so I just hold the button down for the period I want to check on.  If you're using a clock card you can probably insert instructions into your program to start and stop the interrupts at the points you want.

If one of the counters passes 255 Profiler will Break into the Monitor.  Otherwise get into the monitor after your program has finished and examine the table.  There's a record of exactly where your program has been.

In a large program, the bin with the highest count may be too wide to really tell where the bottleneck is.  If so, just use the control Y command to profile only the bin that had the largest count.  This will divide that section into 16 segments so you can see more detail.


Limitations and possible improvements

The profiler described by Leas and Wintz displays the counts as a bar graph, so the largest count really stands out.  My version just leaves the addresses and counts where you can read them with the Monitor, so I'm sure you can come up with ways to improve that.

Sometimes it would be nice to be able to build the address table interactively, rather than having it forced to sixteen equal-sized sections.  Maybe something like entering the starting address you want for each bin, and a zero at the end.


The DOS problem

There has always been a problem with using interrupts in the Apple II under DOS 3.3, but the solutions are now pretty well-known.  Elsewhere in this issue we cover the DOS or Monitor patches necessary to use the 6502's IRQ interrupt without trouble.  This program assumes that all that has been taken care of, or that you don't care.


References

1.  "Chisel Your Code with a Profiler"  Dennis Leas & Paul Wintz.  Byte Magazine.  August, 1983, pp 286-290.
2.  "A Clock Interrupt for Your Apple"  Charles Putney.  Micro Magazine.  July, 1983, pp 36-41.
